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MEMORY SHARED BETWEEN PROCESSING THREADS 
BACKGROUND 

The invention relates to memory shared between 
processing threads. 

A computer thread is a sequence or stream of 
computer instructions that performs a task. A computer 
thread is associated with a set of resources or a 
context . 

SUMMARY 

In one general aspect of the invention, a method 
includes pushing a datum onto a stack by a first 
processor and popping the datum off the stack by the 
second processor. 

Advantages and other features of the invention will 
become apparent from the following description and from 
the claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 is a block diagram of a system employing a 

hardware -based multi- threaded processor. 

FIG. 2 is a block diagram of a MicroEngine employed 

in the hardware -based multi- threaded processor of FIG. 1. 

1 
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FIG. 3 is a block diagram showing instruction sets 
of two threads that are executed on the MicroEngines of 
FIGS. 1 and 2. 

FIG. 4 is a simplified block diagram of the system 
5 of FIG. 1 showing selected sub- systems of the processor 

including a stack module. 

FIG. 5A is a block diagram showing the memory 
components of the stack module of FIG. 4. 

FIG. 5B is a block diagram showing the memory 
10 components of an alternate implementation of the stack 

module of FIG. 4 . 

FIG. 6A is a flow chart of the process of popping a 
datum from the memory components of FIG. 5A. 

FIG. 6B is a block diagram showing the memory 
15 components of FIG. 5A after the popping process of FIG. 

6A. 

FIG. 7A is a flow chart of the process of pushing a 
datum on the memory components of FIG. 6B. 

Fig. 7B is a block diagram showing the memory 
20 components of FIG. 6B after the pushing process of FIG. 

7A. 

FIG. 8 is a block diagram showing memory components 
- used to implement two stacks in one stack module . 



DETAILED DESCRIPTION 
25 Referring to FIG. 1, a system 10 includes a 

parallel, hardware -based multithreaded processor 12. The 
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hardware-based multithreaded processor 12 is coupled to a 
bus 14, a memory system 16 and a second bus 18. The bus 
14 complies with the Peripheral Component Interconnect 
Interface, revision 2.1, issued June 1, 1995 (PCI). The 
system 10 is especially useful for tasks that can be 
broken into parallel subtasks or functions. Specifically 
hardware -based multithreaded processor 12 is useful for 
tasks that are bandwidth oriented rather than latency 
oriented. The hardware -based multithreaded processor 12 
has multiple MicroEngines 22 each with multiple hardware 
controlled threads that can be simultaneously active and 
independently work on a task. 

The hardware-based multithreaded processor 12 
also includes a central controller 20 that assists in 
loading microcode control for other resources of the 
hardware -based multithreaded processor 12 and performs 
other general -purpose computer type functions such as 
handling protocols, exceptions, and extra support for 
packet processing where the MicroEngines pass the packets 
off for more detailed processing such as in boundary 
conditions. In one embodiment, the processor 20 is a 
StrongArm (TM) (StrongArm is a trademark of ARM Limited, 
United Kingdom) based architecture. The general -purpose 
microprocessor 20 has an operating system. Through the 
operating system, the processor 20 can call functions to 
operate on MicroEngines 22a-22f . The processor 20 can 
use any supported operating system preferably a real time 
operating system. For' the core processor implemented as 
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a StrongArm architecture, operating systems such as, 
Microsoft NT real-time, and VXWorks and /zC/OS, a freeware 
operating system available over the Internet at 
http : / / www. ucos-ii . com/ , can be used. 

The hardware -based multithreaded processor 12 
also includes a plurality of functional MicroEngines 22a- 
22f. Functional MicroEngines (MicroEngines) 22a-22f each 
maintain a plurality of program counters in hardware and 
states associated with the program counters . 
Effectively, a corresponding plurality of sets of threads 
can be simultaneously active on each of the MicroEngines 
22a-22f while only one is actually operating at any one 
time. 

In one embodiment, there are six MicroEngines 
22a-22f as shown. Each MicroEngines 22a-22f has 
capabilities for processing four hardware threads. The 
six MicroEngines 22a-22f operate with shared resources 
including memory system 16 and bus interfaces 24 and 28. 
The memory system 16 includes a Synchronous Dynamic 
Random Access Memory (SDRAM) controller 26a and a Static 
Random Access Memory (SRAM) controller 26b. SDRAM memory 
16a and SDRAM controller 26a are typically used for 
processing large volumes of data, e.g., processing of 
network payloads from network packets . The SRAM 
controller 26b and SRAM memory 16b are used in a 
networking implementation for low latency, fast access 
tasks, e.g., accessing look-up tables, memory for the 
core processor 20, and so forth. 
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The six MicroEngines 22a-22f access either the 
SDRAM 16a or SRAM 16b based on characteristics of the 
data. Thus, low latency, low bandwidth data is stored in 
and fetched from SRAM , whereas higher bandwidth data for 
which latency is not as important, is stored in and 
fetched from SDRAM. The MicroEngines 22a-22f can execute 
memory reference instructions to either the SDRAM 
controller 26a or SRAM controller 16b. 

Advantages of hardware multithreading can be 
explained by SRAM or SDRAM memory accesses. As an 
example, an SRAM access requested by a Thread_0 , from a 
MicroEngine, will cause the SRAM controller 26b to 
initiate an access to the SRAM memory 16b. The SRAM, 
controller controls arbitration for the SRAM bus, 
accesses the SRAM 16b, fetches the data from the SRAM 
16b, and returns data to a requesting MicroEngine 22a- 
22b. During an SRAM access, if the MicroEngine e.g., 22a 
had only a single thread that could operate, that 

i 

MicroEngine would be dormant until data was returned from 
the SRAM. By employing hardware context swapping within 
each of the MicroEngines 22a- 22 f, the hardware context 
swapping enables other contexts with unique program 
counters to execute in that -same MicroEngine. Thus, 
another thread e.g., Thread_l can function while the 
first thread, e.g., Thread__0, is awaiting the read data 
to return. During execution, Thread_l may access the 
SDRAM memory 16a. While Thread_l operates on the SDRAM 
unit, "and Thread_0 is operating on the SRAM unit, a new 
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thread, e.g., Thread_2 can now operate in the MicroEngine 
22a. Thread_2 can operate for a certain amount of time 
until it needs to access memory or perform some other 
long latency operation, such as making an access to a bus 
5 interface. Therefore, simultaneously, the processor 12 

can have a bus operation, SRAM operation and SDRAM 
operation all being completed or operated upon by one 
MicroEngine 22a and have one more thread available to 
process more work in the data path. 

10 The hardware context swapping also synchronizes 

completion of tasks. For example, two threads could hit 
the same shared resource e.g., SRAM. Each one of these 
separate functional units, e.g., the FBUS interface 28, 
the SRAM controller 26a, and the SDRAM controller 26b, 

15 when they complete a requested task from one of the 

MicroEngine thread contexts reports back a flag signaling 
completion of an operation. When the MicroEngine 
receives the flag, the MicroEngine can determine which 
thread to turn on. 

20 One example of an application for the hardware- 

based multithreaded processor 12 is as a network 
processor. As a network processor, the hardware -based 
multithreaded processor 12 interfaces to network devices 
such as a media access controller device e.g., a 

25 10/100BaseT Octal MAC 13a or a Gigabit Ethernet device 

13b. The Gigabit Ethernet device 13b complies with the 
IEEE 802. 3z standard, approved in June 1998. In general, 
as a network processor, the hardware -based multithreaded 
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processor 12 can interface to any type of communication 
device or interface that receives/sends large amounts of 
data. Communication system 10 functioning in a 
networking application could receive a plurality of 
network packets from the devices 13a, 13b and process 
those packets in a parallel manner. With the hardware - 
based multithreaded processor 12, each network packet can 
be independently processed. 

Another example for use of processor 12 is a 
print engine for a postscript processor or as a processor 
for a storage subsystem, i.e., RAID disk storage. A 
further use is as a matching engine. In the securities 
industry for example, the advent of electronic trading 
requires the use of electronic matching engines to match 
orders between buyers and sellers. These and other 
parallel types of tasks can be accomplished on the system 
10. 

The processor 12 includes a bus interface 28 
that couples the processor to the second bus 18 . Bus 
interface 28 in one embodiment couples the processor 12 
to the so-called FBUS 18 (FIFO bus) . The FBUS interface 
28 is responsible for controlling and interfacing the 
processor 12 to the FBUS 18." The FBUS 18 is a 64 -bit 
wide FIFO bus, used to interface to Media Access 
Controller (MAC) devices. 

The processor 12 includes a second interface 
e.g., a PCI bus interface 24 that couples other system 
components that reside on the PCI 14 bus to the processor 
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12. The PCI bus interface 24, provides a high-speed data 
path 24a to memory 16 e.g., the SDRAM memory 16a. 
Through that path data can be moved quickly from the 
SDRAM 16a through the PCI bus 14, via direct memory 
access (DMA) transfers. The hardware based multithreaded 
processor 12 supports image transfers. The hardware 
based multithreaded processor 12 can employ a plurality 
of DMA channels so if one target of a DMA transfer is 
busy, another one of the DMA channels can take over the 
PCI bus to deliver information to another target to 
maintain high processor 12 efficiency. Additionally, the 
PCI bus interface 24 supports target and master 
operations. Target operations are operations where slave 
devices on bus 14 access SDRAMs through reads and writes 
that are serviced as a slave to target operation. In 
master operations, the processor core 20 sends data 
directly to or receives data directly from the PCI 
interface 24. 

Each of the functional units is coupled to one 
or more internal buses. As described below, the internal 
buses are dual, 32 bit buses (i.e., one bus for read and 
one for write) . The hardware -based multithreaded 
processor 12 also is constructed such that the sum of the 
bandwidths of the internal buses in the processor 12 
exceeds the bandwidth of external buses coupled to the 
processor 12. The processor 12 includes an internal core 
processor bus 32, e.g., an ASB bus (Advanced System Bus) 
that couples the processor core 20 to the memory 
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controller 26a, 26c and to an ASB translator 30 described 
below. The ASB bus is a subset of the so-called AMBA bus 
that is used with the Strong Arm processor core. The 
processor 12 also includes a private bus 34 that couples 
the MicroEngine units to SRAM controller 26b, ASB 
translator 30 and FBUS interface 28. A memory bus 38 
couples the memory controller 26a, 26b to the bus 
interfaces 24 and 28 and memory system 16 including 
flashrom 16c used for boot operations and so forth. 

Referring to FIG. 2, an exemplary one of the 
MicroEngines x 22a-22f , e.g., MicroEngine 22f is shown. 
The MicroEngine includes a control store 70, which, in 
one implementation, includes a RAM of here 1,024 words of 
32 bit. The RAM stores a microprogram. The microprogram 
is loadable by the core processor 20. The MicroEngine 
22f also includes controller logic 72. The controller 
logic includes an instruction decoder 73 and program 
counter (PC) units 72a-72d. The four micro program 
counters 72a- 72d are maintained in hardware. The 
MicroEngine 22f also includes context event switching 
logic 74. Context event logic 74 receives messages 
(e.g., SEQ_#_EVENT_RESPONSE ; FB I _E VENT_RE S P ONS E ; SRAM 
_EVENT_RESPONSE; SDRAM _E VENTURE S PONS E ; and ASB 
_EVENT_JRESPONSE) from each one of the shared resources, 
e.g., SRAM 26a, SDRAM 26b, or processor core 20, control 
and status registers, and so forth. These messages 
provide information on whether a requested function has 
completed. Based on whether or not a function requested 
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by a thread has completed and signaled completion, the ; 
thread needs to wait for that completion signal, and if 
the thread is enabled to operate, then the thread is 
placed on an available thread list (not shown) . The 
5 MicroEngine 22f can have a maximum of e.g., 4 threads 

available . 

In addition to event signals that are local to 
an executing thread, the MicroEngines 22 employ signaling 
states that are global. With signaling states, an 

10 executing thread can broadcast a signal state to all 

MicroEngines 22. Receive Request Available signal, Any 
and all threads in the MicroEngines can branch on these 
signaling states. These signaling states can be used to 
determine availability of a resource or whether a 

15 resource is due for servicing. 

The context event logic 74 has arbitration for 
the four (4) threads. In one embodiment, the arbitration 
is a round robin mechanism. Other techniques could be 
used including priority queuing or weighted fair queuing. 

20 The MicroEngine 22 f also includes an execution box (EBOX) 

data path 76 that includes an arithmetic logic unit 76a 
and general -purpose register set 76b. The arithmetic 
logic unit 76a performs arithmetic and logical functions 
as well as shift functions. The registers set 76b has a 

25 relatively large number of general -purpose registers. As 

will be described in FIG. 6, in this implementation there 
are 64 general -purpose registers in a first bank, Bank A 
and 64 in a second bank, Bank B. The general -purpose 
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registers are windowed as will be described so that they 
are relatively and absolutely addressable. 

The MicroEngine 22f also includes a write 
transfer register 78 and a read transfer 80. These 
registers are also windowed so that they are relatively 
and absolutely addressable. Write transfer register 78 
is where write data to a resource is located. Similarly, 
read register 80 is for return data from a shared 
resource. Subsequent to or concurrent with data arrival, 
an event signal from the respective shared resource e.g., 
the SRAM controller 26a, SDRAM controller 26b or core 
processor 20 will be provided to context event arbiter 74 
which will then alert the thread that the data is 
available or has been sent. Both transfer register banks 
78 and 80 are connected to the execution box (EBOX) 76 
through a data path. In one implementation, the read 
transfer register has 64 registers and the write transfer 
register has 64 registers. 

Referring to FIG. 3, processor 12 has processing 
threads 41 and 42 executing in MicroEngines 22a and 22b 
respectively. In other instances, the threads 41 and 42 
may be executed on the same MicroEngine. The processing 
threads may or may not share data between them. For 
example, in Fig. 3, processing thread 41 receives data 43 
and processes it to produce data 44 . Processing thread 
42 receives and possesses the data 44 to produce output 
data 45. Threads 41 and 42 are concurrently active. 
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Because the MicroEngines 22a and 22b share SDRAM 16a 
and SRAM 16b (memory) , one MicroEngines 22a may need to 
designate sections of memory for its exclusive use. To 
facilitate efficient allocation of memory sections, the 
SDRAM memory is divided into memory segments, referred to 
as buffers. The memory locations in a buffer share a 
common address prefix, or pointer. The pointer is used 
by the processor as an identifier for a buffer. 

Pointers to buffers that are not currently in use by 
a processing thread are managed by pushing the pointers 
onto a free memory stack. A thread can allocate a buffer 
for use by the thread by popping a pointer off the stack, 
and using the pointer to access the corresponding buffer. 
When a processing thread no longer needs a buffer that is 
allocated to the processing thread, the thread pushes the 
pointer to the buffer onto the stack to make the buffer 
available to other threads.. 

The threads 41 and 42 have processor instruction 
sets 46, 47 that respectively include a "PUSH" 46a and a 
"POP" 47A instruction. Upon executing either the 
* 1 PUSH' ' or the "POP" instruction, the instruction is 
transmitted to a logical stack module 56 (FIG. 4) . 

Referring -to Fig. 4, a section of the processor 9 
and SRAM 16b provide the logical stack module 56. The 
logical stack module is implemented as a linked list of 
SRAM addresses. Each SRAM address on the linked list 
contains the address of the next item on the list. As a 
result, if you have the address of the first item on the 

- 12- 
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list, you can read the contents of that address to find 
the address of the next item on the list, and so on. 
Additionally, each ctddress on the linked list is 
associated with a corresponding memory buffer. Thus the 
5 stack module 56 is used to implement a linked list of 

memory buffers. While in use, the linked list allows the 
stack to increase or decrease in size as needed. 

The stack module 56 includes control logic 51 on the 
SRAM unit 26b. The control logic 51 performs the 

10 necessary operations on the stack while SRAM 16b stores 

the contents of the stack. One of SRAM registers 50 is 
used to store the address of the first SRAM location on 
the stack. The address is also a pointer to the first 
buffer on the stack. 

15 Although the different components of the stack 

module 56 and the threads will be explained using an 
example that uses hardware threads and stack modules, the 
stack can also be implemented in operating system 
software threads using software modules. Thread 41 and 

20 thread 42 may be implemented as two operating system 

threads which execute %> PUSH" and "POP" operating 
system commands to allocate memory from a shared memory 
pool. * The operating system commands may include calls to 
a library of functions written in the ,% C" programming 

25 language. In the operating system example, the 

equivalents of the control logic 51, the SRAM registers 
50 and SRAM 16B are implemented using software within the 
operating system. The software maybe stored in a hard 

- 13- 
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disk, a floppy disk, computer memory, or other computer 
readable medium. 

Referring to FIG. 5A, SRAM register Ql stores an 
address (0xC5) of the first item on the stack 60. The 
SRAM location (0xC5) of the first item on the stack 60 is 
used to store the SRAM address (OxAl) of the second item 
on the stack 60. The SRAM location (OxAl) of the second 
item on the stack 60 is used to store the address of the 
third item on the stack 60, etc. The SRAM location 
(0xE9) of the last item on the stack stores a pre- 
determined invalid address (0x00), which indicates the 
end of the linked list. 

Additionally, the addresses of the items (0xC5, 
OxAl, and 0xE9) on the stack 60 are pointers to stack 
buffers 61a, 61b, 61c contained within SDRAM 16A. A 
pointer to a buffer is pushed onto the stack by thread 
41, so that the buffer is available for use by other 
processing threads. A buffer is popped by thread 42 to 
allocate the buffer for use by thread 42. The pointers 
are used as an address base to access memory locations in 
the buffers. 

In addition to stack buffers 61a-c, SDRAM 16A also 
contains processing buffer 62, which is allocated to 
thread 41. The pointer to processing buffer 62 is not on 
the stack because it is not available for allocation by 
other threads. Thread 41 may later push a pointer to the 
processing buffer 62 onto the stack when it no longer 
needs the buffer 62. c ........ . 

- 14- 
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Although the stack will be discussed with reference 
to the buffer management scheme above, it can be used 
without buffers. Referring to Fig. 5B, the SRAM 
locations 0xC5, OxAl, and 0xE9 may, respectively, contain 
5 data 70a, 70b, and 70c in addition to an address to the 

next item on the list. Such a scheme may be used to 
store smaller units of data 70a- c on the stack. In such 
a scheme, the control logic would assign a memory 
location within the SRAM for storing the unit of data 

10 (datum) that is to be pushed onto the stack. The datum 

pushed onto the stack may be text, numerical data, or 
even an address or pointer to another memory location. 

Referring to FIG. 6A, to pop a datum off the stack 
stored in SRAM register Ql, thread 42 executes 101 the 

15 instruction "POP #1". The pop instruction is part of 

the instruction set of the MicroEngines 22. The pop 
instruction is transmitted to control logic 51 over bus 
55 for stack processing. Control logic 51 decodes 102 
the pop instruction. The control logic also determines 

20 103 the register that contains a pointer to the stack 

that is referred to in the instruction based on the 
argument of the pop instruction. Since the argument to 
the pop instruction is the corresponding register 

is Ql. The control logic 51 returns 104 the contents of 

25 the Ql register to the context of processing thread 42. 

The stack of FIG. 5A would return "0xC5". Processing 
thread 42 receives 107 the contents of the Ql register, 
which is "0xC5", and uses 108 the received content to 

- 15- 
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access data from the corresponding stack buffer 61b by- 
appending a suffix to the content. 

Control logic 27 reads 105 the content (OxAl) of the 
address (0xC5) stored in the Ql register. Control logic 
27 stores 106 the read content (OxAl) in the Ql register 
to indicate that the 0xC5 has been removed from the stack 
and OxAl is now the item at the top of the stack. 

Referring to Fig. SB, the state of the stack after 
the operations of FIG. 6A will be described. As shown, 
the register Ql now contains the address OxAl, which. was 
previously the address of the second item on the stack. 
Additionally, the location that was previously stack 
buffer 61b (in FIG. 5A) is now processing buffer 65, 
which is used by thread 42. Thus, thread 42 has removed 
stack buffer 61b from the stack 60 and allocated the 
buffer 61b for its own use. 

Referring to Fig. 7A, the process of adding a 
buffer to the stack will be described. Thread 41 pushes 
processing buffer 62 (shown in FIG. 6B) onto the stack by 
executing 201 the instruction X% PUSH #1 0x01". The 
argument 0x01 is a pointer to the buffer 62 because it is 
a prefix that is common to the address space of the 
locations in the buffer. The push instruction is 
transmitted to control logic 51 over the bus 55. 

Upon receiving the push instruction, the control 
logic 51 decodes 202 the instruction and determines 203 
the SRAM register corresponding to the instruction, based 
on the second 'argument of the push instruction. Since 

- 16- 
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the second argument is , the corresponding register 

is Ql . The control logic 51 determines the address to be 
pushed from the third argument (0x01) of the push 
instruction. The control logic determines 205 the 
content of the Ql register by reading the value of the 
register location. The value OxAl is the content of the 
Ql register in the stack of FIG. 6B. The control logic 
stores 206 the content (OxAl) of the Ql register in the 
SRAM location whose address is the push address (0x01) . 
The control logic then stores 207 the push address (0x01) 
in the Ql register. 

Referring to FIG. 7B, the contents of the stack 
after the operations of FIG. 7A will be described. As 
shown, the SRAM register Ql, contains the address of the 
first location on the stack, which is now 0x01. The 
address of the first location on the stack is also the 
address of stack buffer 61d, which was previously a 
processing buffer 62 used by thread 41. The location 
OxAl, which was previously the first item on the stack, 
is now the second item on the stack. Thus, thread 41 
adds stack buffer 61d onto the stack to make it available 
for allocation to other threads. Thread 42 can later 
allocate the stack buffer 61d for its own use by popping 
it off the stack, as previously described for FIG. 6A. 

Referring to Fig. 8, a second stack 60b (shown in 
phantom) may be implemented in the same stack module by 
using a second SRAM control register to store the address 
of the first element in the second stack 60b. The second 

- 17- 
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stack may be used to manage a separate set of memory- 
buffers, for example, within SRAM 16b or SDRAM 16a. A 
first stack 60a has the address of the first element on 
the stack 60a stored in SRAM register Ql . Additionally, 
5 a second stack 60b has the address of its first element 

stored in register Q6 . The first stack 60a is identical 
to the stack 60 in Fig. 7B. The second stack 60b is 
similar to previously described stacks. 

Other embodiments are within the scope of the 

10 following claims. Although the stack 60 (shown in FIG. 

5A) stores the pointer to the first element in a register 
Ql, the linked list in SRAM 16B and the buffers in SDRAM 
16A, any of the stack module elements could be stored in 
any memory location. For example, they could all be 

15 stored in SRAM 16b or SDRAM 16a. 

Other embodiments my implement the stack in a 
continuous address space, instead of using a linked list. 
The size of the buffers may be varied by using pointers 
(address prefixes) of varying length. For example, a 

20 short pointer is a prefix to more addresses and is, 

therefore, a pointer to a larger address buffer. 

Alternatively, the stack may be used to manage 
resources other than buffers. One possible application 
of the stack might be to store pointers to the contexts 

25 of active threads that are not currently operating. When 

MicroEngine 22a temporarily sets aside a first active 
thread to process a second active thread, it stores the 
context of the first active thread in a memory buffer* and 
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pushes a pointer to that buffer on the stack. Any 
MicroEngine can resume the processing of the first active 
thread by popping the pointer to memory buffer containing 
the context of the first thread and loading that context. 
Thus the stack can be used to manage the processing of 
multiple concurrent active threads by multiple processing 
engines . 
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What is claimed is: 

1 1. A method comprising: 

2 pushing a datum onto a stack by a first processing 

3 thread; and 

4 popping the datum off the stack by a second 

5 processing thread. 

1 2. The method of claim 1 wherein the pushing 

2 comprises : 

3 executing a push command on the first processing 

4 thread, the push command having at least one argument, 

5 determining a pointer to a current stack datum, 

6 determining a location associated with an argument 

7 of the push command, 

8 storing the determined pointer at the determined 

9 location, 

10 producing a pointer associated with determined 

11 location the pointer to the current stack datum. 

1 3. The method of claim 2 wherein determining a 

2 location comprises : 

3 decoding the push command. 

1 4 . The method of claim 2 wherein determining a 

2 location comprises: 

3 storing an argument of the pop command in a location 

4 associated with the argument of the push command. 
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1 5. The method ot claim 2 wherein said push command 

2 is at least one of a processor instruction, and an 

3 operating system call. 

1 6. The method of claim 1 wherein popping 

2 comprises: 

3 executing a pop command by the second processing 

4 thread, 

5 determining a pointer to a current stack datum, 

6 returning the determined pointer to the second 

7 processing thread, 

8 ' retrieving a pointer to a previous stack datum from 

9 a location associated with the pointer to the current 

10 stack datum, and 

11 assigning the retrieved pointer the pointer to the 

12 current stack datum. 

1 7. The method of claim 6 wherein the location 

2 associated with the pointer to the current stack datum is 

3 the location that has an address equal to the value of 

4 the pointer to the current stack datum. 

1 8. The method of claim 6 wherein the location 

2 associated with the pointer to the current stack datum is 

3 the location that has an address equal to the sum of an 

4 offset and the value of the pointer to the current stack 

5 datum. 
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9 . The method of claim 6 wherein the pop command 
is at least one of a processor instruction or an 
operating system call. 

10. The method of claim 1 further comprising: 
storing data in a memory buffer that is accessible 

using a buffer pointer having the datum that is pushed 
onto the stack. 

11. The method of claim 1 further comprising: 
using the popped datum as a buffer pointer to access 

information stored in a memory buffer. 

12. The method of claim 1 further comprising: 

a third processing thread pushing a second datum 
onto the stack. 

13. The method of claim 1 further comprising: 

a third processing thread popping a second datum of 
the stack. 

14. A system comprising: 

a stack module that stores data by pushing it onto 
the stack and processing threads can retrieve information 
by popping the information off the stack, 

a first processing thread having a first command 
set, including at least one command for pushing data onto 
the stack, and 

a second processing thread having a second command 
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9 set, including at least one command for popping the data 

10 off the stack. 

1 ■ 15. The system of claim 14 wherein the first and 

2 second processing threads are executed on a single 

3 processing engine. 

1 16. The system of claim 14 wherein the first and 

2 second processing threads are executed on separate 

3 processing engines. 

1 17. The system of claim 16 wherein the separate 

2 processing engines are implemented on the same integrated 

3 circuit. 

1 18 . The system of claim 14 wherein the stack module 

2 and the processing threads are on the same integrated 

3 circuit . 

1 19. The system of claim 14 where the first and 

2 second command sets are at least one of a processor 

3 instruction set and an operating system instruction set. 

1 _ 20. The system of claim 14 further comprising a bus 

2 interface for communicating between at least one of the 

3 processing threads and the stack module. 

1 21. A stack module comprising: 

2 control logic that responds to commands from at 

3 least two processing threads, the control logic storing 

4 datum on a stack structure in response to a push command 
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5 and retrieving datum from the stack in response to a pop 

6 command. 

1 22. The stack module of claim 21 further comprising 

,2 a stack pointer associated with the most recently stored 

3 datum on the stack. 

1 23. The stack module of claim 22 further comprising 

2 a memory location associated with a first datum on the 

3 stack, the second memory location including: 

4 a pointer associated with a second datum which was 

5 stored on the stack prior to said first datum. 

1 24. The stack module of claim 22 further comprising 

2 a second stack pointer associated with the most recently 

3 stored datum on a second stack. 

1 25. The stack module of claim 22 wherein the stack 

2 pointer is a register on a processor. 

1 26. The stack module of claim 23 wherein said 

2 memory location includes SRAM memory. 

1 27. The stack module of claim 21 wherein the 

2 commands are processor instructions. 

1 28. The stack module of claim 21 wherein the 

2 commands are operating system instructions. 

1 29. An article comprising a' computer-readable 

2 medium which stores computer logic, the computer logic 

3 comprising: 
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a stack module configured to store data from a first 
processing thread by pashing the data onto a stack and to 
retrieve the data for a second processing thread by 
popping the data off the stack, the stack module being 
responsive to a first processing thread command to store 
data on the stack and a second processing thread command 
to retrieve data from the stack. 

30. An article comprising a computer-readable 
medium which stores computer-executable instructions, the 
instructions causing a processor to: 

store data from a first processing thread by 
executing an instruction to push the data onto the stack; 
and 

retrieve the data for a second processing thread by 
executing an instruction to pop the data from the stack 
for use by the second thread. 



-25- 

SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



PCT/US00/34537 



1/10 



CD 




/ 



CO 



SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



PCT/US00/34537 



2/10 



AMBAI31:01 



MBUSI31:01 



SBUSI31:01 



22F 



SEQ# event je sponse 



FBI _event jesponse 



sram_event res ponse 



sdramjeveril je sponse 



amba_event res ponse 



74 

Context Event Arbiter 



72c- 



- uPC 2 



engine controller 
uPC 1 \s-72b 



72 



- uPC 3 



uPC 4 



72a 
73 



decode 



address 



immediate jdata^ 



70 



control store 
1024 words 



32 bit words 



FIG. 2A 

SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



3/10 



50- 



78- 



s 



Rd Transfer Regs 
rd 



rda 



76b 



32 SRAM/ Amba 



Write rdb 
Transfer Regs 



8 each / cofltexf 



32 SD/?>1M /?egs 
32 SRAM/ Amba 



General Purpose 
Registers 



5 



a 



17 



/shifter / 



1/1/ 



Cflegs 



64 A GPRs 
64 B GPRs 



16 A. 16 B 
per context 



-76a 



76 



FIG. 2B 



SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



PCT/US00/34537 



4/10 



DATA 43 



THREAD 41 



INSTRUCTION 
SET 46 

INSTR... 
INSTR... 
INSTR... 
PUSH #10x01 

^46a 



DATA 44 



THREAD 42 



INSTRUCTION 
SET 47 



/~47a 

P0P#1 
INSTR... 
INSTR... 
INSTR... 



DATA 45 



SDRAM 



52 A 

MBUS 



STACK MODULE 56 
53-x 
SRBUS 



SRAM 



L 



^56 



FIG. 3 



PROCESSOR 



^26a 



MEMORY 
UNIT 



SRAM UNIT 



SRAM 
(REGISTERS 



CONTROL 
LOGIC 



l^26b 
-50 

-51 



FIG. 4 

SUBSTITUTE SHEET (RULE 26) 



12 




^MICRO 
ENIGINES 22 



WO 01/50247 



PCT7US00/34537 



5/10 



SRAM / 
REGISTERS 



Q1 
Q2 
Q3 
Q4 
Q5 
Q6 
Q7 
08 



0xC5 



50 

0X01 
0X02 

FF^OXAI 
0XA2 
0XA3 

0XC4 
^V=^0XC5 

0XC6 



60 



0xE8 
0XE9 
OXFO 



0X00 



SRAM 




PROCESSING 

BUFFER 
(FOR THREAD 
41) 



wa 
Y 62 

0X01 0-01 F 



STACK BUFFER 



0X020-02F 
r 6la 











0XA1 


STACK BUFFER 











i ; . ■ 




STACKBUFFER 



0XA10-A1F 
0XA20-A2F 
0XA30-A3F 



0XC4O-C4F 

-61b 
0XC50-C5F 

0XC60-C6F 



0XE80-E8F 
-61C 
0XE90-E9F 

OXFOO-FOF 



FIG. 5 A 



SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



PCT/USOO/34537 



6/10 

SRAM f 50 



SRAM- 



■16b 



REGISTERS 



01 
02 
Q3 
Q4 
Q5 
Q6 
Q7 
Q8 



0xC5 



0x01 
0X02 



T^OXAI 
0XA2 
0XA3 

0XC4 
*-±*OxC5 

0XC6 



FIG. 5B 



60 



0xE9 


DATA > 











70b 







0xA1 


DATA < 







r70a 



0XE8 
0XE9 
OxFO 







0x00 


DATA / 







r7oc 



THREAD 42 



EXECUTE "POP #7 "I/- 70/ 
INSTRUCTION 



DECODE "POP" 
INSTRUCTION 

I 



^102 



DETERMINE POP 
REGISTER 



103 



RETURN CONTENT OF 
Q1 REGISTER 


/-104 




RECEIVE CONTENT OF 
Q1 REGISTER 




\ 




♦ 


READ CONTENT OF 
ADDRESS STORED IN 
Q1 REGISTER 


/-105 i 


USE CONTENT TO 
DETERMINE RELEVANT 
MEMORY 



107 



1 



^108 



STORE CONTENT OF 
ADDRESS IN Q1 



^106 



FIG. 6 A 

SUBSTITUTE SHEET (RULE 26) 



ACCESS RELEVANT 
MEMORY 



■109 



WO 01/50247 



PCT/USOO/34537 



7/10 



SRAM f 
REGISTERS 



16b-, 
50 \RAM 
0X01 " 



Q1 
02 
Q3 
04 
05 
Q6 
Q7 
08 



0xA1 



0X02 

-0XA1 
0XA2 
0XA3 



0X04 
0XC6 



0XE8 
0XE9 
OXFO 



0XE9 




SRAM- 

PROCESSING 
BUFFER 62 
(FOR THREAD 
41) 



■16b 



STACKBUFFER 











0xA1 


PROCESSING y 
BUFFER 














STACKBUFFER, 



Wr it ~:;V : 



0X010-01F 

0X020-02F 
r 61a 

0XA10-A1F 
0XA20-A2F 
0XA30-A3F 

0XC40-C4F 
s-65 

0XC50-C5F 
0XC60-C6F 

0XE80-E8F 
r6lC 
v 0XE90-E9F 

OXFOO-FOF 



FIG. 6B 



SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



PCT/USOO/34537 



8/10 



DECODE "PUSH" 
INSTRUCTION 



r 



202 



DETERMINE RELEVANT 
SRAM REGISTER 



r 



203 



DETERMINE PUSH 
ADDRESS 



r 



204 



DETERMINE CONTENT OF ]/~ 205 
Q1 REGISTER 



1 



STORE CONTENT OF Q1 
REGISTER IN ADDRESS 
CORRESPONDING TO 
'PUSH ADDRESS 



r 



206 



STORE PUSH ADDRESS 
IN Q1 REGISTER 



r 



207 



THREAD 41 



EXECUTE "PUSH #1 
0X01" INSTRUCTION 



J 



r 



201 



FIG. 7A 



SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



9/10 



PCI7US00/34537 



16a 



Q1 

Q2 
Q3 
Q4 
Q5 
Q6 
Q7 
Q8 



16b-, 

r50 \raM 
SRAM/ 0(J 

REGISTERS fi^™ 



0X01 



0XA1 



tZrOxAl 
0XA2 
0XA3 

0XC4 
0XC5 
0XC6 



0XE8 
0XE9 
OXFO 



0XE9 




0x00 



SRAM r-61d 



STACK BUFFER 



0X010-01F 
0X020-02F 



STACK BUFFER 



y6ia 
v 0XA10-A1F 











OxA1 













0XA20-A2F 
0XA30-A3F 

0XC40-C4F 
0XC50-C5F 
0XC60-C6F 




STACK BUFFER/ 



0XE80-E8F 
^610 



0XE90-E9F 
OXFOO-FOF 



FIG. 7B 



SUBSTITUTE SHEET (RULE 26) 



WO 01/50247 



PCT/US00/34537 



10/10 



wa 



16b 



50- 

Q1 
Q2 
Q3 
Q4 
Q5 
Q6 
Q7 
Q8 



\SRAM 
REGISTERS 



0X01 



0XA3 



60b- 



r60a 

^0X01 
0X02 



r=^0XA1 

0XA2 
A -—OX A3 
i 

T 

0X04 
0XC5 

0XC6 



SRAM 



0XA1 



0XE9 



0X04 



SDRAM 




/ 



STACK BUFFER 



r66a 



STACK BUFFER 



STACK BUFFER, 



0X010-01F 
OX020-02F 

66b 



^0XA10-A1F 

0XA20-A2F 
\r67a 
V 0XA30-A3F 





STACKBUFFER, 


0X00 




0XA1 















67b 



0XC50-C5F 
0XC60-C6F 




DATA BUFFER 



0XE80-E8F 
/■66C 

0XE90-E9F 
OXFOO-FOF 



FIG. 8 



SUBSTITUTE SHEET (RULE 26) 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
12 July 2001 (12.07.2001) 




PCT 



(10) International Publication Number 

WO 01/50247 A3 



(51) International Patent Classification 7 : G06F 9/30. 9/46 

(21) International Application Number: PCT/US00/34537 

(22) International Filing Date: 

19 December 2000 ( 1 9. 1 2.2000) 



(25) Filing Language: 

(26) Publication Language: 



English 
English 



(30) Priority Data: 

09/479,377 



5 January 2000 (05.01.2000) US 



(63) Related by continuation (CON) or continuation-in-part 
(CIP) to earlier application: 

US 09/479.377 (CON) 

Filed on 5 January 2000 (05.01 .2000) 



(71) Applicant (for all designated States except US): INTEL 
CORPORATION f US/US1; 2200 Mission College Boule- 
vard, Santa Clara, CA 95052 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): WOLRICH, 
Gilbert [US/US]; 4 Cider Mill Road. Framingham. MA 
01701 (US). ADILETTA, Matthew, J. [US/US]; 20 Mon- 
ticello Drive. Worcester, MA 01603 (US). WHEELER, 
William fUS/US]; 9 Darlene Drive. Southborough. MA 
01772 (US). CUTTER, Daniel [US/US]; 14 ^Walnut 
Street. Townsend, MA 01469 (US). BERNSTEIN, Dcbra 
[US/US]; 443 Peakham Road, Sudbury. MA 01776 (US). 

(74) Agent: HARRIS, Scott, C; Fish & Richardson P.C.. Suite 
500, 4350 La Jolla Villaee Drive, San Diego. CA 92122 
(US). 



[Continued on next page J 



(54) Title: MEMORY SHARED BETWEEN PROCESSING THREADS 



76a 



< 

© 
in 

I 



SRAM f" 
REGISTERS 



01 
02 
03 
04 
05 
06 
07 
08 



0xC5 -, 



60 



0XC4 
0X06 



0xE8 
L+0XE9 

OXFO 



0X00 




SRAM^ 

PROCESSING y 62 
BUFFER 



(FOR THREAD 
41) 



QX010-01F 



0X020-02F 



r6la 



STACK BUFFER 0XA10-A1F 
0XA2Q-A2F 
0XA3O-A3F 











STACK BUFFER 


0XA1 









STACKBUFFER 



-61b 



0XC60-C6F 



OXE80-E8F 
-61C 
0XE90-E9F 

0XF00-FQF 



(57) Abstract: A method includes pushing 
a datum onto a stack by a first processor and 
popping the datum off the stack by a second 
processor 



WO 01/50247 A3 I fllll lUllfll 11 lilfll till! lilll A HI lilll lllfl HI1J 1IIN fllllllllil till HIJ UK 



(81) Designated Stales (nafional): AE. AG. AL. AM. AT. AU. 
AZ, BA. BB. BG. BR, BY. BZ, CA. CH. CN. CR. CU. CZ. 
DE. DK, DM. DZ. EE. ES. R, GB. GD. GE. GH. CM. HR. 
HU. ID. IL. IN. IS. JP. KE. KG. KR KR. KZ. LC LK. LR. 
LS. LT. LU, LV. MA. MD. MG. MK. MN. MW. MX. MZ. 
NO. NZ, PL, PT. RO. RU. SD. SE. SG. SI. SK. SL. TJ.TM. 
TR, TT, TZ. UA, UG, US. UZ. VN. YU, ZA. ZW. 



(84) Designated States (regional): AR1PO parent (GH. GM. 
KE. LS. MW. MZ. SD. SL. SZ. TZ. UG. ZW). Eurasian 
patent (AM. AZ. BY. KG. KZ. MD, RU. TJ, TM ), European 
patent (AT, BE, CH. CY. DE. DK. ES. R. FR, GB. GR, IE. 



IT. LU. MC. NL. PT. SE. TR). OAPI palent (BF. BJ. CF. 
CG. CI. CM. GA. GN. GW. ML. MR, NE. SN, TD. TG). 

Published: 

— with international search report 

(88) Date of publication of the international search report: 

31 January 2002 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



INTERNATIONAL SEARCH REPORT 



lm .tlonal Application No 

PCT/US 00/34537 



A. CLASSIFICATION OF SUBJECT MATTER , 

IPC 7 G06F9/30 G06F9/46 



According to International Paient Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system lollowed by classification symbols) 

IPC 7 606F 



Documentation searched olher lhan minimum documentation (o the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and. where practical, search terms used) 

EPO-Internal, INSPEC 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category s Cilalion of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



US 5 905 889 A (WILHELM JR GEORGE WILLIAM) 
18 May 1999 (1999-05-18) 
column 4, line 53 - line 67 



umn 


7. 


line 


16 


- line 35 


umn 


8, 


line 


20 


- line 30 


umn 


8, 


line 


60 


-column 9, line 32 



EP 0 809 180 A (SEIKO EPSON CORP) 
26 November 1997 (1997-11-26) 



page 3, line 54 - line 55 



1,10-13, 
29 

14-19, 
21-23, 
27,30 
5-8 



14-19, 
21-23, 
27,30 



-/-- 



Further documents are listed in the continuation of box C. 



Patent family members are listed in annex. 



• Special categories of cited documents : 

*A* document defining the general state of the art which is not 

considered to be of particular relevance 
'E* earlier document but published on or after the international 

tiling date 

V document which may throw doubts on priority claim (s) or 
which is cited to establish the publication dale ot another 
citation or other special reason (as specified) 

'O* document referring to an oral disclosure, use. exhibition or 
other means 

'P* document published prior to the international tiling date but 
later than the priority dale claimed 



*T a later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
invention 

*X* document ot particular relevance; the claimed invention 
cannot be considered novel or cannot be considered to 
involve an inventive step when the document is taken alone 

'Y* document of particular relevance: the claimed invention 

cannot be considered to involve an inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in Hi* ail. 

"A" document member of the same patent family 



Date of the actual completion of the international search 

6 July 2001 



Dale ot mailing ot the international search report 

06/08/2001 



Name and mailing address ol the ISA 

European Patent Office, P.B. 5818 Patenllaan 2 
NL - 2280 HV Rijswtjk 
Tel. (♦31-70) 340-2040. Tx. 31 651 epo nt. 
Fax: (+31-70) 340-3016 



Authorized oflicer 



Moraiti , M 



Form PCT/1SA/210 (second shMt) (July t992) 



page 1 of 2 





INTERNATIONAL SEARCH REPORT 


In itlonat Application No 

PCT/US 00/34537 


C(Contlnuatlon) DOCUMENTS CONSIDERED TO BE RELEVANT 


Category • 


Citation ol document, with indication, where appropriate, ot the relevant passages 


Relevant to claim No 


A 


HYDE R L: "Overview of memory management" 
BYTE, APRIL 1988, USA, 
vol. 13, no. 4, pages 219-225, 
XP002162801 
ISSN: 0360-5280 
figure 1 




1.14,21, 
29,30 



Form PCT/ISA/210 (continuation cH second short) (July 1992) 



page 2 of 2 



INTERNATIONAL SEARCH REPORT 

Information on patent family members 



In itlonal Application No 

PCT/US 00/34537 



Patent document 
cited in search report 

US 5905889 



EP 0809180 



Publication 
date 



18-05-1999 



26-11-1997 



Patent family 
member(s) 



US 6233630 B 



JP 10091443 A 



Publication 
date 



15-05-2001 



10-04-1998 



form PCT/1SA/210 (patent family annex) (July 1992) 



